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Among numerous point mutation differences between the SARS-CoV- 
2 and the bat RaTG13 coronavirus, only the 12-nucleotide furin 
cleavage site (FCS) exceeds 3 nucleotides. A BLAST search revealed 
that a 19 nucleotide portion of the SARS.Cov2 genome encompassing 
the furing cleavage site is a 100% complementary match to a codon- 
optimized proprietary sequence that is the reverse complement of the 
human mutS homolog (MSH3). The reverse complement sequence 
present in SARS-CoV-2 may occur randomly but other possibilities 
must be considered. Recombination in an intermediate host is an 
unlikely explanation. Single stranded RNA viruses such as SARS-CoV- 
2 utilize negative strand RNA templates in infected cells, which might 
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lead through copy choice recombination with a negative sense SARS- 
CoV-2 RNA to the integration of the MSH3 negative strand, including 
the FCS, into the viral genome. In any case, the presence of the 19- 
nucleotide long RNA sequence including the FCS with 100% identity 
to the reverse complement of the MSH3 mRNA is highly unusual and 
requires further investigations. 


Introduction 

Based on a recent publication describing insertion variants of SARS-CoV-2 (1) 
we would like to bring the attention to our recent findings related to the 
sequence of the furin cleavage site (FCS) in SARS-CoV-2 Spike (S) protein. 
The SARS-CoV-2 causing the COVID-19 pandemic (2) has 82.3% amino acid 
identity to bat coronavirus SL-CoVZC45, 77.2% amino acid identity to SARS- 
CoV, and 96.2% genome sequence identity to bat coronavirus RaTG13. While 
numerous point mutation differences exist between SARS -CoV-2 and RaTG13, 
only one insertion and dissimilarity exceeding 3 nucleotides (nt): a 12- 
nucleotide insertion coding for four amino acids (aa 681-684, PRRA) in the 
SARS-CoV-2 S protein has been discovered. This polybasic FCS differentiates 
SARS-CoV-2 from other b-lineagebetacorona viruses or any other 
sarbecovirus (3). An FCS addition enhanced the infectivity of SARS Co-V-2 in 
2019 (4). The absence of this FCS results in attenuated SARS -CoV-2 variants 
useful for animal vaccination, accentuating its relevance to human infection 
(5). This FCS is vital for human and ferret transmission (6), expands viral 
tropism to human cells (7), and is requisite for severe disease in two animal 
models of SARS-CoV-2 (8). 


SARS-CoV-2 Spike Protein and MSH3 


A peculiar feature of the nucleotide sequence encoding the PRRA furin 
cleavage site in the SARS-CoV-2 S protein is its two consecutive CGG codons. 


This arginine codon is rare in coronaviruses: relative synonymous codon usage 
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(RSCU) of CGG in pangolin CoV is o, in bat CoV 0.08, in SARS-CoV 0.19, in 
MERS-CoV 0.25, and in SARS-CoV-2 0.299 (9). 

A BLAST search for the 12-nucleotide insertion led us to a 100% reverse match 
in a proprietary sequence (SEQ ID11652, nt 2751-2733) found in the US patent 
9,587,003 filed on Feb. 4, 2016 (10) (Figure 1). Examination of SEQ ID11652 
revealed that the match extends beyond the 12-nucleotide insertion to a 19- 
nucleotide sequence: 5’-CTACGTGCCCGCCGAGGAG-3’ (nt 2733-2751 of SEQ 
ID11652), such that the resulting mRNA would have 3'- 
GAUGCACGGGCGGCUCCUC-5’, or equivalently 5'- CU CCU CGG CGG 
GCA CGU AG-3’ (nucleotides 23547-23565 in the SARS-CoV-2 genome, in 
which the four bold codons yield PRRA, amino acids 681—684 of its spike 
protein). This is very rare in the NCBI BLAST database. 


FIGURE 1 
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Figure 1. The origin ofthe furin sequence in SARS-CoV-2. Comparison of the protein sequences at the 


S1/S2junctionin SARS-CoV, RaTGi3, and SARS-CoV-2 demonstrating the presence of the furincleavage 


site (FCS) PRRA only in SARS-CoV-2. Basedona BLAST search of the 12-nucleotide stretch coding for 


the FCS PRRA,a19-nucleotide long identical sequence was identified in the patented (US 958 7003) 


sequence Seq ID11652. SEQID11652 is transcribed to a MSH3 mRNA that appears to be codon optimized 


for humans. This 19-nucleotide sequence including 12 nucleotides coding for the FCS PRRA, presentin 


the human MSH3 gene might have been introduced into the SARS-CoV-2 genome by the illustrated copy 


choice recombination mechanism in SARS-CoV-2 infected human cells overexpressing the MSH3 gene. 
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The correlation between this SARS-CoV-2 sequence and the reverse 
complement of a proprietary mRNA sequence is of uncertain origin. 
Conventional biostatistical analysis indicates that the probability of this 


sequence randomly being present in a 30,000-nucleotide viral genome is 3.21 
x10- (Figure 2). 


Figure 22 


PI 


P1 = P (19 nucleotide sequence appears in a 30,000nt sequence genome) 
= (30,000- 18) X 1/41!* 


P2 = P (19 nucleotide sequence appears in 3300nt sequence) 
= (3300-18) X 1/4!* 
21.19 X 10° 


=> 


P2c:= P (19 nucleotide sequence appears in one of the 24712 sequences of 3300 nucleotides) 

= (24712) X P2 X (1-P2y*'! 

= 24712 X 1.19 X 10° X (1- 1.19 X 10$} +11 2 0.00029 

P3 

P3 = P (identical sequence appears once in 30,000 sequence genome and in library of 24,712 sequences of approximately 3300 nucleotides 


each) 
=Pl X P2c 
232] X 10H 





2. Calculations ofthe probability of natural occurrence ofthe 19nt sequence under study. The SARS-CoV- 
2 genomeis ~30,000 nucleotides long (P1). The patented sequence is ~3,300 nucleotideslong (P2). The 
patented library encompasses 24712 sequences of varying lengths with medianlength being in therange 
0f3,300nucleotides. Conventional probability calculations are given ofthe probability ofthe presence of 


a 19-nucleotide sequence in the human genome andin one ofthe patented library sequences. 


The proprietary sequence SEQ ID11652, read in the forward direction, encodes 
a 100% amino acid match to the human mut S homolog 3 (MSH3) (9). MSH3 
is a DNA mismatch repair protein (part of the MutS beta complex) (11). SEQ 
ID11652 is transcribed to a MSH3 mRNA that appears to be codon optimized 
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for humans (12). We did not find the 19-nucleotide sequence 
CTCCTCGGCGGGCACGTAG in any eukaryotic or viral genomes except SARS- 
CoV-2 with 100% coverage and identity in the BLAST database 
(Supplementary Tables 1-3). 


Discussion 

MSH3 replacement with a codon-optimized mRNA sequence for human 
expression likely has applications in cancers with mismatch repair 
deficiencies. While a portion of a reverse complement sequence being present 
in SARS-CoV-2 could be a random coincidence, other possibilities merit 


consideration. 


Overexpression of MSH3 is known to interfere with mismatch repair (MSH2 
sequestration from the MutS alpha complex comprising MSH2 and MSH6 
results in MSH6 degradation and MutS alpha depletion) (13), which holds 
virologic importance. Induction of DNA mismatch repair deficiency results in 
permissiveness of influenza A virus (IAV) infection of human respiratory cells 
and increased pathogenicity (14). Mismatch repair deficiency may extend 
shedding of SARS-CoV-2 (15, 16). 

The absence of CTCCTCGGCGGGCACGTAG from any eukaryotic or viral 
genome in the BLAST database makes recombination in an intermediate host 
an unlikely explanation for its presence in SARS-CoV-2. A human-codon- 
optimized mRNA encoding a protein 100% homologous to human MSH3 
could, during the course of viral research, inadvertently or intentionally 
induce mismatch repair deficiency in a human cell line, which would increase 
susceptibility to SARS-like viral infection. Infection of SEQ ID11652 -MSH3- 
transduced human cells by a SARS-like virus could enable copy choice 
recombination (15). Replication of SARS-CoV-2 and other single stranded 
RNA viruses with an RNA genome of positive polarity is initiated by the 
synthesis of negative strand RNA in the cytoplasm of infected cells (17) (Figure 
1). The negative strand RNA is a template for synthesis of positive stranded 
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RNA utilized for translation of non-structural proteins, the replication and 
transcription complex, or new virion capsids. Coronaviruses generate double 
stranded RNA at an early stage of infection through genomic replication and 
mRNA transcription (18). 

Acquisition of the reverse complement FCS sequence from an overexpressed 
positive sense MSH3 mRNA could occur through copy choice recombination 
with a negative sense SARS-CoV-2 RNA intermediate (15), involving jumping 
from one template to another (19) (Figure 1). The homology between SARS- 
CoV-2 and other known coronaviruses is discontinued and most SARS-CoV-2 
sequences derive from a relatively recent common ancestor with bat RaTG13. 
Moreover, similarity plots (SimPlots) have identified sudden changes in 
sequence identity between SARS-CoV-2 and RaTGi3, signaling potential 
recombination events, which could explain the capability of SARS -CoV-2 
binding to ACE2 through its RBD, which is not the case for the RaTG13 RBD 
(15). 

A criticism of this hypothesis is that the identified sequence is on the opposite 
strand of the open reading frame in SEQ ID11652. However, cells transfected 
with MSH3, which induce mismatch repair deficiency could have targeted 
double-stranded cDNA encoding SEQ ID11652. Such cells co-transfected with 
a SARS-like virus expressing RdRp could attach to this 19-nucleotide sequence 
(15) and permit integration of a fragment from the negative strand into the 
viral genome, including the FCS, despite being on the opposite strand of the 
open reading frame. Mismatch repair mechanisms have enabled integration of 
short fragments from antisense strands in experimental models (20, 21). 
Microhomology can direct recombination between the MSH3 and a SARS-like 
virus, which could take place at the 19-nucleotide sequence of interest. 

The presence in SARS-CoV-2 of a 19-nucleotide RNA sequence encoding an 
FCS at amino acid 681 of its spike protein with 100% identity to the reverse 
complement of a proprietary MSH3 mRNA sequence is highly unusual. 


Potential explanations for this correlation should be further investigated. 
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18 Comments - Christophe Brun, John Smith, Martijn Weterings, Vincent 
Joseph, Seth Crosby, Fernando Cardona, Jon Burz, Zeli Zhang, JIAO SUN, Suzie 
Pulte and Jf K. 





Christophe Brun Really interesting but in the introduction, the sentence "An FCS 
addition enhanced the infectivity of SARS Co-V-2 in 2019 (4)" makes me think the study in 
reference 4 date from 2019 which does not seem to be the case in any way. The article date 
from 2020, it relates to previously published articles of course but... It can create a false 
sense that SARS Co-V-2 was studied in 2019. 

3:38 AM, 22 February 2022 


John Smith Probability theory states that P(A and B) = P(A) x P(B) only if events A and B 
are independent events.<br/><br/>The calculations shown in Figure 2 of the article 
are:«br/» «br/»P(19nt sequence in humans appears in the virus's 30,000nt 
genome)«br/»- (30,000 - 18) x 10*-7<br/><br/>Thesetwo probabilities are multiplied, 
but there is no evidence given in the article that a sequence appearing in humans is 
independent from it appearing in the virus. In fact, the opposite may be true. The virus is 
not a random sequence. The virus is being studied because it matches the sequence in 
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humans, making it more able to infect humans. These are not shown to be independent 
events, and thus it is incorrect to multiply their probabilities.«br/» «br/» P(19nt sequence in 
humans appears in one of the 24712 sequences of 3300nt patented by Moderna)«br/»- 
24712x P2 x (1-P2)^24711«br/» «br/»The article does not provide any evidence that these 
are independent events either. On the contrary, a pharmaceutical company which aims to 
treat illnesses in humans is probably more likely to patent sequences from the human 
genome. Thus, these are likely not independent events either, and their probabilities cannot 
be multiplied. «br/» «br/» These issues may increase the final probability of 3.21 x10^ -11 
shown in the article, making the article inaccurate. If possible, would you be able to offer an 
explanation? 

9:05 PM, 23 February 2022 


User 


Comment deleted on 3:47 AM, 24 February 2022 


"S 


Martijn Weterings John Smith points out how the probability theory is a simplification, 





but even if we would consider the nucleotides completely random and independent then still 
the conventional biostatistical analysis contains several mayor errors. Some of them 
increase the probability others decrease the probability.<br/><br/>#### PROBLEM 1 
####<br/><br/>The probabilities P1 and P2: the occurrence of an m-length nucleotide 
sequence in a random nucleotide sequence of length n. Thecomputation computes the 
probability of the sequence of length m in a particular sequence (0.25)^m and multiplies 
this with n-(m-1). <br/><br/>(n-m+1)*(0.25)*m<br/><br/>We can very clearly show that 
this is wrong by using a smaller problem. What is the probability that a nucleotide T (length 
m-1) occurs in a random genome of length n=4. Itis *not* 4*0.25^1 = 1. That is the 
expectation value for the number of occurances. But the probability for 1 or more 
occurances is more like:<br/><br/>1-(1-(0.25)*m)*(n-m+1)<br/><br/>And... that is 
getting closer, but *also* wrong. This considers the probability as n-m+1 independent flips 
of a coin with probability (0.25)^m. But the flips are not independent.«br/» «br/»To geta 
more precise results we can consider the problem: A fair die is rolled 1,000 times. What is 
the probability of rolling the same number 5 times in a row? 
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https://stats.stackexchange.com/questions/492000 /a -fair-die-is-rolled-1-000-times-what- 
is-the-probability -of-rolling-the-same-nu<br/><br/>The solution of this problem can be 
computed in several ways. The fastest method is to compute it with a Markov Chain. Below 
is a program in R that does this computation:<br/><br/><br/>### create Markov Chain 
matrix «br/»M = matrix(rep(0,20*20),20)«br/»M[1,1:19] = 3/A«br/»for (1 in 1:19) 
{M[i+1,i] = 1/4) «br/»M[20,20] =1<br/><br/>### compute probability «br/»start <- 
c(1,rep(0,19))«br/»matrix.power(M,3300) %*% start«br/» «br/»T he result is 9.0 * 10^ -9 
(8.955794e-09) which is a *smaller* probability than the 11.9 * 10^ -9 from the article, but 
not a big difference.<br/><br/>#### PROBLEM 2 ####<br/><br/>The article computes 
the probability that "the probability of this sequence randomly being present in a 30,000- 
nucleotide viral genome". But this is a red herring. Itis not the relevant probability under 
consideration. «br/» «br/»Namely, the article computes the probability fora *specific 
sequence* to occurin the viral genome and in the database of 24712 sequences. 

«br/» «br/»This is like computing the probability of rolling two six sided dices with the 
same number as p1*p2 = 1/6*1/6 = 1/36. But every fan of board games knows that the 
probabiliy of the same number on the two dice is 1/6instead of 1/36. We can roll two ones, 
two twos, two threes, two fours, two fivesor two sixes. Each situation has 1/36 probability, 
and together the probabiltiy is 1/6. <br/><br/>#### PROBLEM 3 ####<br/><br/>There 
is a problem of multiple comparisons. «br/» «br/»The article focuses on a sequence 
CCTCGGCGGGCA and finds a match in a database for a piece of length 19 around it with 2 
pairs in front and 5 pairs behind it CT -CCTCGGCGGGCA-CGT AG. The computation of te 
probability is for this specific match of the sequence CT -CCTCGGCGGGCA-CGT AG, but we 
could have also had other matches of length 19 TAATTCT-CCTCGGCGGGCA, AATTCT- 
CCTCGGCGGGCA-C, ATT CT-CCT CGGCGGGCA-CG, TTCT-CCTCGGCGGGCA-CGT, TTCT- 
CCT CGGCGGGCA-CGT, TCT-CCT CGGCGGGCA-CGTA, CT -CCT CGGCGGGCA-CGTAT, T- 
CCT CGGCGGGCA-CGTATG, CCT CGGCGGGCA-CGTATGT, as well as the sequences in 
reverse order.<br/><br/>This is a complicated multiple comparisons problem which needs 
to be computed more exactly (it is complicated due to the correlations between the 
individiual cases) but it is certain that it will considerably *increase* the 
probability.<br/><br/>#### Conclusion ####<br/><br/>The problem 1 will decrease the 
probability, but not by much. The problems 2 and 3 will increase the probability and by a 
lot. «br/» «br/» Exact computations need to be done for problem 3 but it is probably gonna 
be some order times of the probability P2c (which in the altered computation is the slightly 
less 0.0002212666 instead of 0.00029). 

3:47 AM, 24 February 2022 
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~- 


Vincent Joseph Relevant comments here: ----- https://sciencebasedmedicine.org/the- 
return-of-the-revenge-of-covid-19-mrna-vaccines-permanently-alter-your-dna-and-lab-leak 
-===> Frontiers is sadly becoming something it should not be. 

12:59 PM, 03 March 2022 


Seth Crosby That stretch of 19 bases (CTACGTGCCCGCCGAGGAG) is not unique to 
SarsCoV2 and Moderna; it occurs in a number of microorganisms. Using BLAST, a simple 
online tool, I compared it to all known genomes (excluded SARSCoV2 to see what other 
organisms contain it). Dozens of perfect hits. Try it yourself! 
https://blast.ncbi.nlm.nih.gov/ Blast.cgi 

7:16AM,04 March 2022 





Fernando Cardona “The absence of CTCCTCGGCGGGCACGTAG from any eukaryotic or 
viral genome in the BLAST database makes recombination in an intermediate host an 


unlikely explanation for its presence in SARS-CoV-2". Thelast assertion, taken from the 
discussion in that paper, is the result of a poorly use of the BLAST tool. Indeed, more careful 
searches with this 19-nucleotide long sequence in BLAST and using stringent parameters 
(10096 identity and 10096 coverage) and restricted to different phyla/genus/species (virus 
excluding SARS-CoV-2, birds, bacteria, bats or reference RNA sequences) provide 
numerous hits. This demonstrates the existence of this exact sequence in numerous other 
eukaryotes, prokaryotesor viruses. Theresults of our basic BLAST search certainly 
questions the idea that this could only happen when a coronavirus infected a human being, 
as the paper seem to suggest. 

3:03AM, 07 March 2022 


User 
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Comment deleted on 7:52 AM, 07 March 2022 


User 


Comment deleted on 7:52 AM, 07 March 2022 


Fernando Cardona CTCCTCGGCGGGCACGT AG«br/»-Virus, excluding SARS-CoV-2, 
RefSeq representative genomes<br/>100%identity and 100% coverage: phages, 
herpesvirus, poxvirus, halovirus, adenovirus, ... (2372 sequences)«br/»e-value«1: 
Pandoravirus (2), Orf virus, Murid herpesvirus 1, Murid betaherpesvirus 1<br/>-Virus, 
RefSeq representative genomes, excluding all phages (8 types)«br/» -RefSeq representative 
genomes, birds. 100%identity and 100% coverage«br/»Chaetura, Aquila, Ansa, Oxyura, 
Aythya, Cygnus, Serinus, Gallus«br/»- RefSeq representative genomes, Bacteria. 
100%identity and 100% coverag: 68 sequences<br/>-Reference RNA sequences. 
100%identity and 100% coverage: Saprolegnia parasitica (Protista), Saprolegnia diclina 
(Protista), Chaetura pelágica (bird).<br/>-Bats. 100%identity and 100% coverage: 
Pipistrellus, Molossus, Phyllostomus, Rousettus, Pteropus (14 sequences). 

7 :53 AM,07 March 2022 


Jon Burg Is anyone else floored that the reviewers ofthis manuscript allowed authors to 
make these claims and did not require them to follow-up with any actual lab work? 
11:22AM, 10 March 2022 





Zeli Zhang Totally agree with Seth Crosby, Blast the 19 bases 
(CTACGT GCCCGCCGAGGAG) by yourself, tons of hits. «br/»The conclusions in the paper 
do not make any sense. It is misleading. 
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9:53AM, 23 March 2022 


User 


Comment deleted on1:33 AM, 24 March 2022 


User 


Comment deleted on1:53 AM, 24 March 2022 


JIAO SUN Dear Balamurali, your statement "We did not find the 19-nucleotide sequence 
CTCCTCGGCGGGCACGTAG in any eukaryotic or viral genomes except SARS -CoV-2 with 
100% coverage and identity in the BLAST database" is not right. We find a sequence of 
'Saprolegnia parasitica’ 100% match your 19-nt sequence. Genbank Accession: DN616748, 
location: 304 -286(revert). 

10:01AM, 24 March 2022 


User 


Comment deleted on 1:00 AM, 25 March 2022 


Suzie Pulte JIao Sun, Zeli Zhang , Vincent Joseph & Fernando Cardona, not sure which 
BLAST database you're using but, the one hosted by NIH shows ZERO hits when yousearch 
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for the 19 nucleoside sequence (CT CCTCGGCGGGCACGT AG) found in the Modera patent 
when you EXCLUDE SARS-CoV-2 (taxid:2697049) from the results, meaning, the only 
virus in their database that shares that same sequence IS SARS-COV-2. Yourcomments are 
HIGHLY disingenuous, and honestly quite silly when you know full well that anyone with an 
internet connection and one working finger can do a BLAST search, on their own, and find 
out that your statements are incorrect. Here is a REAL blast search for 
"CTCCTCGGCGGGCACGTAG" : 

https://blast.ncbi.nlm.nih.gov /Blast.cgi?CMD=Get&RI D=4BBM56H1016 And here are the 
results once you filter to EXCLUDE SARS-CoV-2 (taxid:2697049): 
https://blast.ncbi.nlm.nih.gov /Blast.cgi?CMD=Get&RI DZ4BCEPK5X0o1R Youget ZERO 
HITS, contrary to what you've said. Note: As you know, these query links are expire. These 
will expire on 04-01-2022 @ 21:37 pm, so get them while they are hot and maybe think 
twice before saying things that aren't true. 

11:06 AM, 31 March 2022 


Jf K Suzie, Jiao Sun gave the Genbank assession ID. I guess it is not hard to verify the 
result: https:/ /www.ncbi.nlm.nih.gov/nuccore/DN616748.1/. I'm not sure which database 
he was using, but when I use the database "refseq rna", I also find a sequence of 
'Saprolegnia parasitica' which has a 10096 match. Its sequence can be found here: 

https:/ /www.ncbi.nlm.nih.gov/nucleotide/XM. 012354388.1?report-genbank&log$ -nuclal 
ign&blast rank-2&RID-4D740XKNo16 

1:43 AM, 01 April 2022 
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https://www.ncbi.nim.nih.gov/nuccore/DN616748.1/ 


SPM12H8 Saprolegnia parasitica ATCC90214 mycelium 


library Saprolegnia parasitica cDNA, mRNA sequence 
GenBank: DN616748.1 


LOCUS DN616748 517 bp mRNA linear EST l3-EEB-Z201I 
DEFINITION SPM12H8 Saprolegnia parasitica ATCC90214 mycelium library 
Saprolegnia parasitica cDNA, mRNA sequence. 


ACCESSION DN616748 


VERSION DN616748.1 

DBLINK BioSample: SAMN00176110 
KEYWORDS EST. 

SOURCE Saprolegnia parasitica 


ORGANISM Saprolegnia parasitica 


Eukaryota; Sar; Stramenopiles; Oomycota; Saprolegniales; 
Saprolegniaceae; Saprolegnia. 
REFERENCE 1 (bases 1 to 517) 
AUTHORS Torto-Alalibo,T., Tian,M., Gajendran,K., Waugh,M.E., van West,P. 
and Kamoun,S. 
TITLE Expressed sequence tags from the oomycete fish pathogen Saprolegnia 
parasitica reveal putative virulence factors 
JOURNAL BMC Microbiol. 5 (1), 46 (2005) 
PUBMED 15076392 
COMMENT Contact: Kamoun S 
Department of Plant Pathology 
The Ohio State University-OARDC 
1680 Madison Avenue, Wooster, OH 44691-4096, USA 
Tel: 330 263 3847 
Fax: 330 263 3841 
Email: Kamoun.1@osu.edu. 
FEATURES Locatton/Ouallfizefs 
source 1..517 
/organism="Saprolegnia parasitica" 
/mol type="mRNA" 
/isolate="ATCC90214" 
/db_xref="taxon: 101203" 
/tissue type-"mycelium" 
/clone lib-"SAMN00176110 Saprolegnia parasitica ATCC90214 
mycelium library" 


/dev stage-"Grown on GYM medium for 29 days till media was 
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almost depleted to mimic stress conditions" 


/note-"Vector: pSPORT1; Site 1: Sall (G/TCGAC); Site 2: 


ORIGIN 


T 


>Cc 
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Nucleotide 


Graphics + 


SPM12H8 Saprolegnia parasitica ATCC90214 mycelium library Saprolegnia parasitica 


tcgacccacg 
caacgttgcce 
catgcgtatc 
ggccgccaag 
gttggaaaac 
ggagcttaag 
cggttgcaac 
ctcgagcatc 


gtacatcatt 


Notl 


cgtccgcgga 
gcgatcaagg 
aacgccacgt 
atgcgtggtg 
accaacLcoga 
gaaggccaca 
LogLacatrtg 
ttcacgattg 


ggtaacgcca 


(GC/GGCCGC)" 


cgcgtgggct 
tcgctggcgce 
ccaacatcgc 
ccgagcgtcg 
agcttcgcat 
acacgttcgt 
ccaaggccat 
gtgacaaggg 
cggagacgtg 


ggtaccgacg 
ccotggcatg 
caagatcacc 
catgaacgac 
ctacgagaag 
gcccatcacg 
gcgcctccag 
ecocggccag 


gatcaac 


@ ncbi.nlm.nih.gov/nuccore/DN616748.1?report-graph 
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CDNA, mRNA sequence 


GenBank: DN616748.1 


GenBank 
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aacatgtcct 
gccacggaga 
aagtcgatga 
ggctgcccgt 
gatggctacg 
tccgaccgtg 
atcaacgacg 


cttcgccgca 


X Tools ~ | $È Tracks - $. Download - 49 ? ~ 


366 


388 


tggtgtcccg 
agcagattct 
agatggtgtc 
tcgccacgtg 
tgcccgccga 
gtcectctgcgg 
tgacggagaa 


cgcacggcaa 
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nk-2& RIDZ4D740XKNO16 


Saprolegnia parasitica CBS 223.65 ATP synthase F1 


mRNA 
NCBI Reference Sequence: XM_012354388.1 


LOCUS XM 012354388 1072 bp mRNA linear PLN 22-APR-2015 
DEFINITION Saprolegnia parasitica CBS 223.65 ATP synthase Fl mRNA. 
ACCESSION XM 012354388 
VERSION XM 012354388.1 
DBLINK BioProject: PRJNA280969 
BioSample: SAMN02981252 
KEYWORDS RefSeq. 
SOURCE Saprolegnia parasitica CBS 223.65 

ORGANISM Saprolegnia parasitica CBS 223.65 
Eukaryota; Sar; Stramenopiles; Oomycota; Saprolegniales; 
Saprolegniaceae; Saprolegnia. 

REFERENCE 1 (bases 1 to 1072) 

AUTHORS Jiang,R.H., de Bruijn,I., Haas,B.J., Belmonte,R., Lobach,L., 
Christie,J., van den Ackerveken,G., Bottin,A., Bulone,V., 
Diaz-Moreno,S.M., Dumas,B., Fan,L., Gaulin,E., Govers,F., 
Grenville-Briggs,L.J., Horner,N.R., Levin,J.Z., Mammella,M., 
Meijer,H.J., Morris,P., Nusbaum,C., Oome,S., Phillips,A.J., van 
Rooven,D., RZzeszubLek,E., 5drdiva,M., Secombes,C.J., Seidl, Milks, 
Snel,B., Stassen,J.H., Sykes,S., Tripathy,S., van den Berg,H.; 
Vega-Arreguin,J.C., Wawra,S., Young,S.K., Zeng,Q., 
Dieqguez-Uribeondo,J., Russ,C., Tyler,B.M. and van West,P. 

TITLE Distinctive expansion of potential virulence genes in the genome of 
the oomycete fish pathogen Saprolegnia parasitica 

JOURNAL PLoS Genet. 9 (6), E1003272 (2013) 

PUBMED 23185293 
REFERENCE 2 (bases 1 to 1072) 

CONSRTM NCBI Genome Project 

TITLE Direct Submission 

JOURNAL Submitted (22-APR-2015) National Center for Biotechnology 
Information, NIH, Bethesda, MD 20894, USA 

REFERENCE 3 (bases 1 to 1072) 
AUTHORS Russ,C., Nusbaum,C., Tyler,B., van West,P., Dieguez-Uribeondo,J., 


de Bruijn, I., Young,S.K., Zeng,Q., Gargeya,S., Alvarado,L., 
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CONSRTM The Broad Institute Genome Sequencing Platform 
TITLE Direct Submission 
JOURNAL Submitted (14-FEB-2011) Broad Institute of MIT and Harvard, 7 
Cambridge Center, Cambridge, MA 02142, USA 
COMMENT PROVISIONAL REFSEQ: This record has not yet been subject to final 
NCBI review. This record is derived from an annotated genomic 
sequence (NW 012156553). 
FEATURES Location/Qualifiers 
source Tes LA 
/organism-"Saprolegnia parasitica CBS 223.65" 
/mol type-"mRNA" 
[sStrtalns"CBS 223,065" 
/db xref-"taxon:695850" 
/ chromosome-" Unknown" 
ene tee L072 
/locus tag="SPRG 15329" 
/do xref="GeneID:24137067" 
CDS 32 ee 928 
/locus tag-"SPRG 15329" 
/codon start-1 
/product-"ATP synthase F1" 
/protein id-"XP 012209778.1" 
/do xref="GenelID: 24137067" 
/translation-"MSLVSRNVAAIKVAGARGMATEKQILMRINATSNIAKITKSMKM 
VSAAKMRGAERRMNDGRPFATWLENTNSKLRIYEKDGYVPAEELKEGHNTFVPITSDR 
GLCGGCNSYIAKAMRLOINDVTENSSIFTIGDKGRGOLRRTHGKYIIGNATETWINPT 
NFAKASALAEVVLSMTPADEKLHVIFNKFOSAILYOOSIRTINTDPETYADYELEPDN 
KEEVLLDLKEFOLATAIFHGMLESNTSEESSRMTAMENASSNASDLISSLRLVYNKAR 
OSRITTELIEIISGAASLDAKQ" 
ORIGIN 
1 cttgagtttt tcatcactgg taccgacgaa catgtcecttg gtgtccecgca acgttgeccge 
bl gatcaaggtc gctggcgccc gtgdgcatgge cacggagaag cagattctca tgcgtatcaa 
121 cgccacgtcc aacatcgcca agatcaccaa gtcgatgaag atggtgtcgg ccgccaagat 
181 gcgtggtgcc gagcgtcgca tgaacgacgg OGOQOOCHEEC gecacgtggt tggaaaacac 
241 caactcgaag cttcgcatct acgagaagga tggctacgtg cccgccgagg agctcaagga 
301 aggccacaac acgtttgtgec ccatcacgtc cgaccgtggt ctetgecggecg gttgcaactce 
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Griggs,A., Gujja,S., Heilman,E., Heiman,D., 


Mehta,T., Neiman,D., Pearson,M., Roberts,A., Saif,S., 
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gtacattgcc 
cacgattggt 
taacgccacg 
ggtcgtgctc 
gtcggcgatt 
cgactatgag 
gctcgcgacc 
catgacggcc 
cgtgtacaac 
cgcggcttcg 
ctccgatcag 


gtgacttggc 


aaggccatgc 
gacaagggcc 
gagacgtgga 
tcgatgaccc 
ctgtaccagc 
ctcgagccgg 
gccatcttcce 
atggagaacg 
aaggeccoce 
ttggacgcca 
ccaacgatga 


cttattgcca 


gcctccagat 
gcggccagct 
tcaacccgac 
cggcggacga 
agtcgatccg 
acaacaagga 
acggcatgct 
cgtcgagcaa 
agtogogoat 
agcagtaaga 
tgataaacgg 


ccaaagacct 


caacgacgtg 
tcgccgcacg 
caactttgcce 
gaagctccac 
cacgatcaac 
ggaagtcctc 
cgagtccaac 
cgcgtccgac 
tacgaccgag 
agcagaggac 
atgtcttgtg 


cgatcgatag 
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acggagaact 
cacgqcaagt 
aaggcogtegg 
qtgatcttca 
acggaccccg 
ctcgacctca 
acgtcggagg 
ttgatctcga 
ctcatcgaaa 
gaagatgaga 
caacacgctg 


tccaccagct 


cgagcatctt 
acatcattgg 
cgctcgccga 
acaagttcca 
agacgtacgc 
aggagttcca 
agtcgtcgcg 
gcctccgcct 
ttatttccgg 
cgcacgcacg 
taccttgaat 
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Saprolegnia parasitica CBS 223.65 ATP synthase F1 mRNA 


NCBI Reference Sequence: XM 0123545388.1 


GenBank FASTA 


Link To This View | Feedback 





J © XM 0123543881 ~ | Find: voa mi-————— atc FR Tools ~ | Mf Tracks ~ $. Download - GP ~ 

58 ima 158 eai arä 3BB 358 468 456 See 558 Bas Boe Fea Foe She a56 ana S58 1E iB? 
- 
Genes Ba e 





epsilon subunit inte.. =E 


58 
XM 012354388.1: 1..1.1K (1,072 nt) 


iaa 158 ea 386 358 468 456 5a 555 eai Boe Tua 388 358 1E 1,87 


Z & Tracks shown: 2/3 


Page 25 of 25 


